Bayesian compositional regression with microbiome features via variational inference

The microbiome plays a key role in the health of the human body. Interest often lies in finding features of the microbiome, alongside other covariates, which are associated with a phenotype of interest. One important property of microbiome data, which is often overlooked, is its compositionality as it can only provide information about the relative abundance of its constituting components. Typically, these proportions vary by several orders of magnitude in datasets of high dimensions. To address these challenges we develop a Bayesian hierarchical linear log-contrast model which is estimated by mean field Monte-Carlo co-ordinate ascent variational inference (CAVI-MC) and easily scales to high dimensional data. We use novel priors which account for the large differences in scale and constrained parameter space associated with the compositional covariates. A reversible jump Monte Carlo Markov chain guided by the data through univariate approximations of the variational posterior probability of inclusion, with proposal parameters informed by approximating variational densities via auxiliary parameters, is used to estimate intractable marginal expectations. We demonstrate that our proposed Bayesian method performs favourably against existing frequentist state of the art compositional data analysis methods. We then apply the CAVI-MC to the analysis of real data exploring the relationship of the gut microbiome to body mass index. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-023-05219-x.


CAVI-MC Updates
This section contains all of the variation inference updates for the CAVI-MC.
By imposing a constraint on θ we introduce a covariance between the elements θ j which we capture within the mean field family. The joint posterior is p(y, ϑ) =p(y|·) × s p(β s |w, γ s ) × s p(γ s |ω) × g p(ζ g , χ g ) × p(χ g |ϱ) Define the mean-field approximation distribution as with f (ϑ) (j) as the j-th moment of f (ϑ) with respect to q(ϑ), E q [f (ϑ) j ].
By defining a block in the mean field approximation as a multivariate density q(θ, ξ), this allows us to incorporate correlation between the elements in θ (and the corresponding elements in ξ) related to the compositional explanatory variables and the correlation between θ j and ξ j . Now the expectation is with respect to the vector.

CAVI Updates
The CAVI update is proportional to By exponentiating and completing the square we have where cst is a constant with respect to β s and γ s . The spike-and-slab prior forces the latent selection variables into the likelihood component By exponentiating and completing the square we arrive at and thus by calling Note that now (1.28) The index g denotes the categorical factor groupings g = 1, ..., G and m g is the dimension of the vector ζ g . As the categorical factors are coded with reference to the intercept, m g is always 1 less than the levels in the categorical factor.
and the vector µ θ ξ and matrix Σ θ ξ Unlike in the β s updates for the free variational parameters, these are still function of the vector ξ. On completing the square we have We can remove the index by adding the constraint on µ θ ξ and Σ θ ξ with the matrix T ξ .
log q(θ, ψ, ξ|y, We can then identify the singular multivariate normal density log q(θ, ψ, ξ|y, which can be expressed as We can identify the singular multivariate normal density (1.40) which is a function of ξ and ψ. The ξ and ψ component (1.41) contains terms which do not have a conjugate update. The first term has dependencies on ξ in µ θ ξ and Σ θ ξ which are a function of ψ and the remaining q expectations. Thus which implies that As the update for ξ from the construction of the MCMC and the SMVN is the update can be solved in closed form, using the MCMC marginal expectations.
The update for q(ϱ) is

Pseudo Updates
The pseudo updates are derived in full. The prior parametrisation is The joint update q(Ω j , Υ j ) is which gives us the update The terms in the q(Υ j ), using ∆ j = 0 when Υ j = 0, are proportional to Which after normalisation is Note that now (1.106) The approximating q density for ∆ j , which is proportional to ∆ j but conditional on Υ j This gives The auxiliary parameters create an alternative DAG which is updated via a "separate branch" of pseudo updates which helps us to approximate the model in order to guide the MCMC step. These updates are refined at each iteration by the full VI updates which account for the constraint. The "sparsity" parameter κ and the hyperparameters a ∆ , b ∆ which are set to a ψ , b ψ provide a link back to the constrained model. Figure S1: Diagram depicting the order and structure of the CAVI updates. Although the CAVI-MC permits any order, the pseudo updates for the auxiliary parameters help guide the MCMC and are performed directly before the q(θ, ψ, ξ) MC update. The pseudo updates for an unconstrained model are in the dashed box and branch off prior to the joint q(θ, ψ, ξ) update. The q approximating densities q(∆ j |Υ j = 1) are then used to guide the MCMC step.

ELBO
The objective of VI is to find the candidate from a family of densities D which best approximates, the one closest in KL divergence, to exact conditional q * (ϑ) = arg min q * (ϑ)∈D

KL(q(ϑ)||p(ϑ|y))
This objective is not computable as it requires computing marginal likelihood. If we expand the expression we can identify the elements which are a function of the parameters in the model. As the KL cannot be computed, an alternative objective that is equivalent to the KL up to an added constant is the evidence lower bound (ELBO).
This function is the negative KL divergence plus the marginal likelihood, and is optimised at each iteration of the CAVI in order to monitor its convergence. The computational details are: The functions are where ||u|| 2 is defined in (1.82).
The approximating density is only known up to a constant of proportionality but this is sufficient for the ELBO calculations.
The q expectations (ξ j log(ψ j )) (1) and (ξ j ψ −1 j ) (1) can be found using the law of iterative expectations but these will cancel. The free parameters are a function of ξ so when we take an expectation we have Bringing together the expression forB

RJMCMC moves and model proposals
This section explains the RJMCMC moves in detail. In the RJMCMC the proposal for ψ j |ξ j = 1 is from the q approximating density of the auxiliary parameter Ω j , where the free parameters are obtained from the pseudo updates. As q(θ|ψ, ξ) is available in closed form, we are able to sample directly from it. Since the proposals do not depend on their current values, this leads to a reverse move which is a random function and thus a Jacobian which is equal to 1.
The RJMCMC involves the following steps: • Select a birth-death or swap move with probability ϕ, 1 − ϕ.
• Generate u from our proposal density g where h is a specified invertible mapping function.
• Accept the proposed move to model ξ ′ with probability where the target is in the square parenthesis.
The acceptance probability for the RJMCMC between-model move, as the Jacobian is equal to 1, simplifies to where j m (ξ, ξ ′ ) is the proposal probability for the latent variable selection parameter ξ ′ (which depends on the move type and the data) and As described in the main paper, a univariate approximation is used to calculate j(ξ, ξ ′ ) in the birth-death or swap move of the RJMCMC.

Birth-death and swap moves
To guide the RJMCMC over a large binary space, we use a univariate approximatioñ p(ξ j = 1|ϑ) of the joint approximating density q(ψ, ξ) relative to the jth element. The probability of a new model j m (ξ, ξ ′ ) is a function of this approximation and the move type.
Each time a variable is selected for (or removed from) the model, the remaining approximate probabilities proposal for all elements outside of the model must be renormalised.
The normalised probabilities for a variable h to be selected for the model, the birth move where anyp(ξ j = 1|ϑ) below a small threshold ε b (set at 1 × 10 −30 ) is replaced by ε b to avoid zero probabilities. The normalised probabilities to remove a variable h from the model M, the death move is as we select the variables to remove with probability inversely proportional to the approximate probability of inclusion. ε d guarantees that the probabilities are comparable when they are close to the limit of their domain. The difference between the groups is relative to the size of ε d .
If i is the current iteration, define j (ξ j ) [i] = (d ξ ) [i] the size of the current model in the MCMC, the proposal is generated in the following way: Sample (birth-death) and swap with probability ϕ and 1−ϕ respectively if 2 ≤ (d ξ ) [i] < d: • (Birth-Death) Sample uniformly birth or death: -(Birth): If (d ξ ) [i] = 0 add 2 variables else add 1.
When we add two elements h and l the order is not important. As the probability of selecting each element is not the same, we have to add the probabilities so that is the probability of choosing element h first and element l second plus the probability of choosing element l first and element h second (the order is in the bracket).
• (Swap): -Sample a variable included in the model h and swap with one outside l. (2.10)

Within-model moves
Within-model samples are included so that both ψ and θ are sampled sufficiently. This enables the calculation of q expectations within the ELBO and the free parameter updates for q(σ 2 ). Its is particularly important when estimating ||u|| (2) as the calculation has to be split into its component parts, because the latent variables which perform variable selection need to be incorporated for the expectations. If θ|ξ, ψ has not been sampled sufficiently to estimate E q [θ T ξ Z T ξ Z ξ θ ξ ], then the cross product terms may not be sufficiently large enough to prevent the dot product from having a negative value.

Proofs
Here are some simple proofs of the results used in the derivations.

Proof: Simplification of the constraint matrix
We can simplify the calculations. TT T = TT = T. If we define the matrix Then for the diagonal component of TT we either have entries corresponding to the dot Using the matrix determinant lemma where A is an invertible square matrix and u, v we can prove that the determinant of this matrix is zero. Express T as 3.2 Proof: Eigenvalues of T comprise of d − 1 1's and one 0.
To find the eigenvalues of T need to solve det(T − λI) = 0 (3.6) for λ. Using the lemma in Equation (3.3) and T − λI = A + uv T where Therefore the eigenvalues for T are

Proof: T = T +
Using the SVD T can be expressed as U ΛV . As T is symmetric U ΛV = U ΛU . The pseudo inverse is This approach can also be used to solve the pseudo determinant det*(θT) (where θ is a scalar) which is a product of the non-zero eigenvalues. The eigenvalues of the scaled matrix can be found solving det(θT − λI) = 0.
The eigenvalues, found by setting this expression to zero are Thus the expression det*(2πwT ) = (2πw) d−1 .
(3.15) 4 Tables   Tables 9 to 16 contain the squared bias diagnostic for the additive-log-ratio simulation experiment for each signal-to-noise ratio (SNR).         Figure